Update seastar for v25.3.x#228
Merged
piyushredpanda merged 86 commits intoredpanda-data:v25.3.xfrom Aug 11, 2025
Merged
Conversation
Seastar http server implementation supports multiple listeners. It may be required for the handler logic to know which listener the connection is coming from. Added listener_idx field to `httpd::request` to allow handler recognize listener. Signed-off-by: Michal Maslanka <michal@vectorized.io>
Since an exception carries some text for the response body text, the raising site might like to specify the content type if it's e.g. json. Signed-off-by: John Spray <jcs@vectorized.io>
This enables throwing a base_exception from a json request handler with a json payload inside it. Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
Prior to this patch seastar only exposes one global metrics::impl::impl object which holds all metric related data for one application. This patch changes the implementation details such that multiple metrics::impl::impl objects can exist for any given application. Said objects are stored into a map on each shard and created dinamically whenever requested. A metrics::impl::impl is identified by an integer handle that acts as the key for the storage map. Implementation note: in order to avoid issues caused by the ordering of static thread_local objects I had to declare the storage in reactor.cc. (cherry picked from commit 585a8af)
This patch extends the metrics internal apis to use a specific metrics::impl::impl object identified by its integer handle. (cherry picked from commit 6ee4af7)
Add a public method to metric_groups_impl that exposes the handle of the internal implementation it is using. This is required in order for the metric_groups class to be able to reset itself to the configured implementation handle.
This patch extends the metrics user facing apis to use a specific metrics::impl::impl object identified by its integer handle. Note that the constructor of 'metric_groups' is marked explicit in this patch and updates two call sites where the constructor was used implicitly.
This patch removes two subsequent calls to `get_local_impl` and reuses the returned handle in that scope.
This patch extends the user facing prometheus apis allowing the user to specify the internal metrics implementation to be used through a handle. Additionally, 'add_prometheus_routes' now takes an argument that specifies the route on which to advertise the metrics. This enables different metrics "namespaces" to be served by different endpoints in isolation. (cherry picked from commit 6189522)
This patch extends the scollectd apis with the ability to select the internal metrics implementation to be used by providing a handle. (cherry picked from commit d4331d1)
This patch adds a 'get_skip_when_empy' getter to the 'registered_metric' class. It is used by follow-up patches in order to replicate metrics.
This patch adds private methods to the 'metrics::impl' class that deal with the creation of replicated metrics. They will be used to build the public api in future commits.
This patch adds private helpers to 'metrics::impl' that deal with the removal of replicated metric families from their destintation implementation. These methods will be used in subsequent commits to manage the lifetime of replicated metrics.
This patch adds a public method to the 'metrics::impl' class: 'set_metric_families_to_replicate'. When this method is called the families that match any of the specifications will be replicated on the specified destinations.
This patch extends the metric registration and unregistration processes to make them aware of metric replication. In the case of metric registration, if the new metric belongs to a family that matches one of the replication specs, then a replicated metric is created accordingly. For unregistration of a metric, the replicated metric is unregistered too if one exists.
This patch exposes a method in the public interface of the metrics
module ('replicate_metric_families'), which enables metric replication
internally for the requested metric families.
Extends the metrics api to allow changing the aggregation labels of a metrics family. Otherwise one had to un-register every single metric instance in a metric family and then re-register with the changed aggregation labels. For metric families with thousands of instances (e.g.: histograms with lots of different labels) this is quite expensive. With this change we avoid the full reconstruction of the metrics family and all its metrics. Only the work associated with marking the metrics `dirty()` is needed then.
There was a problem hiding this comment.
Pull Request Overview
Updates the Seastar library to v25.3.x with significant cryptographic infrastructure changes and comprehensive test infrastructure improvements.
Key changes:
- Replaces OpenSSL implementation with a new TLS provider abstraction supporting both OpenSSL and GnuTLS
- Adds CPU profiler functionality with comprehensive testing and signal handling
- Introduces metric family replication capabilities and Prometheus integration testing
Reviewed Changes
Copilot reviewed 71 out of 71 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tls_test.cc | Updates TLS tests to support dual crypto provider architecture with OpenSSL and GnuTLS branches |
| tests/unit/stall_detector_test_utilities.hh | Extracts reusable stall detector testing utilities to shared header |
| tests/unit/stall_detector_test.cc | Refactors to use shared utilities and removes code duplication |
| tests/unit/prometheus_test.cc | Adds comprehensive testing for Prometheus metrics HTTP endpoint |
| tests/unit/metric_family_replication_test.cc | Introduces testing for metrics replication across handles |
| tests/unit/cpu_profiler_test.cc | Comprehensive CPU profiler testing with signal handling and timing validation |
| tests/unit/cpu_profiler_alloc_test.cc | Tests CPU profiler signal handler memory allocation safety |
| tests/unit/CMakeLists.txt | Updates build configuration for new tests and certificate generation |
| tests/perf/linux_perf_event.cc | Removes assertion on perf event read result |
| src/websocket/common.cc | Adds dual crypto provider support for WebSocket SHA1 and base64 operations |
| src/util/backtrace.cc | Adds guarded backtrace function with signal safety |
| src/seastar.cc | Updates includes for new TLS implementation structure |
| src/rpc/rpc.cc | Fixes buffer construction to use explicit constructor |
| src/net/tls.cc | Major refactor splitting TLS implementation with provider abstraction |
| src/net/tls-impl.hh | New header defining TLS implementation interfaces and common structures |
| src/net/tls-impl.cc | Implementation of shared TLS functionality and credential management |
| src/net/posix-stack.cc | Fixes namespace qualification for format function calls |
Comments suppressed due to low confidence (1)
tests/unit/stall_detector_test_utilities.hh:77
- [nitpick] The function name 'spin_user_hires' is ambiguous. Consider renaming to 'spin_user_high_resolution' to clarify that it uses high resolution clock for user-space spinning.
[[maybe_unused]] void spin_user_hires(std::chrono::duration<double> how_much) {
7 tasks
We simply ignore premature termination error as it is GnuTLS specific. Signed-off-by: Michał Maślanka <michal@redpanda.com>
sigev_notify_thread_id is defined as a macro by musl libc, FreeBSD, and the Linux kernel. The macro is for accessing the corresponding field in sigevent. see https://sourceware.org/bugzilla/show_bug.cgi?id=27417 (cherry picked from commit bbe1af3) tdowns note: this is a re-pick of the above as the function moved so the original was not applied
Created tls-impl.cc and tls-impl.h which contains common structures and definitions that are not dependent on the underlying TLS mechanism. These changes set the stage for implementing other TLS providers. Signed-off-by: Michael Boquard <michael@redpanda.com>
This commit adds support for using OpenSSL, instead of GnuTLS, as the TLS provider within Seastar. To support this change, the configure script has been updated to allow users to select which cryptographic provider should be used by supply `--crypto-provider` and specificying either `OpenSSL` or `GnuTLS`. The OpenSSL implementation mirrors the GnuTLS implementation. Instead of using callbacks, a custom BIO was created to handle moving data on/off of the OpenSSL SSL session into the Seastar TLS session data sinks. When compiled for OpenSSL, the `certificate_credentials::set_priority_string` method is compiled out and replaced with the following: * `set_cipher_string` * `set_ciphersuites` * `enable_server_precedence` * `set_minimum_tls_version` * `set_maximum_tls_version` These methods are specific to OpenSSL. The github actions have been updated to run the full suite of tests against both cryptographic providers. `src/net/tcp.hh` and `src/websocket/server.cc` have been updated to use OpenSSL instead of GnuTLS, depending upon the build configuration. Signed-off-by: Michael Boquard <michael@redpanda.com>
Added pretty-print capabilities to seastar::tls::session for OpenSSL and added a number of log statements that may be helpful if debugging the implementation. Signed-off-by: Michael Boquard <michael@redpanda.com>
Prior to this commit, in certain situations, the put() and get() methods would 'drop into' handshake() if they determined that a re-negotiation was occurring. This was unecessary as OpenSSL provides helpful error codes like SSL_WANT_READ and SSL_WANT_WRITE which indicates if OpenSSL needed to send/receive non-user data to handle re-negotiations. This may have also led to a rare situation where the put() method would assert because the _output_pending future had not resolved. This change properly handles the SSL_WANT_READ and SSL_WANT_WRITE conditions and ensures that the _output_pending future resolves. Signed-off-by: Michael Boquard <michael@redpanda.com>
More recent versions of OpenSSL requrire CA certificates to have CA:true Signed-off-by: Michael Boquard <michael@redpanda.com>
Now handling situations where the get() call doesn't throw but does return an empty buffer indicating EOF. Signed-off-by: Michael Boquard <michael@redpanda.com>
TLS renegotiation was removed in TLSv1.3 due to issues around security. This change makes that the default behavior, however a method is exposed to re-enable it if desired. This was only added to OpenSSL as GnuTLS does not provide a means to fully disable renegotiation. Signed-off-by: Michael Boquard <michael@redpanda.com>
Added a dn_format flag to switch between legacy and RFC2253 format. Legacy format matches that originally returned by the GnuTLS api `gnutls_x509_crt_get_dn` which returns the DN in the format of `C=US,ST=London,L=London,O=Redpanda Data,OU=Core,CN=id`. RFC2253 format, effectively reverses the order: `CN=id,OU=Core,O=Redpanda Data,L=London,ST=London,C=US`. Signed-off-by: Michael Boquard <michael@redpanda.com>
We need to build with OpenSSL for the TLS tests to work on our fork.
Seastar has a C++20 modules variation of the build in GHA, but our changes have left it quite broken as we don't use modules. Disable this build for now so we can get the build green and protect the seastar branch.
This commit introduces an alternative reload callback prototype which includes the credentials that were reloaded. In service of that, also adds `reloadable_credentials_base::as_certificate_credentials` to produce a const-ref to `certificate_credentials`, suitable for exposure through the new callback. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
This commit introduces `struct cert_info` which encapsulates useful information about TLS certificates, currently: - The certificate's serial number (as assigned by the signing CA) - The certificate's expiration time Additionally, this commit adds public API on `tls::certificate_credentials` to access this information for loaded certificates AND the loaded trust list. The motivation for this change is to give applications the ability to monitor and report on certificate and CA lifetimes on demand, the idea being that an application may cache serial numbers and expiries when credentials are reloaded, serving up a summary of those details until the next reload time. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
GnuTLS provides serial numbers not as a legible string of hex digits but as a byte array corresponding to an integer value up to 20 octets wide. The cert_info interface should reflect this. To that end, this commit updates the type of `cert_info::serial` to an non-null-terminated basic_sstring of uint8_t. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Passed up to application code as a tls::blob (basic_string_view<char>). Also provide the ability to fetch this blob directly via credentials_builder::get_trust_file_blob() Updates unit tests to accommodate the new reload callback signature. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
char_traits<unsigned> is deprecated in LLVM 18, slated for removal in LLVM 19. Therefore remove this non-standard usage from cert_info. Extracting the x509 serial number is a relatively infrequent operation and the result is always <= 20B, so the cost of using a vector here should be marginal. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
We disable interception of _Unwind_RaiseException when ASAN is enabled since it can produce a large number of stack-related false positives when it is enabled and exceptions are thrown. Fixes scylladb#2892.
Redpanda builds seastar with depcrecations as errors. However, some of the newly deprecated functions in Seastar are used heavily in RP. Hence this commit temporarily comments those annotations out until their usage is removed in RP.
Pure non-functional refactor to split the long line into three. (cherry picked from commit a0155ba)
Switch the io-queue cost function over to use max(IOPS cost, TP cost) away from sum(IOPS cost, TP cost). In its form today the io-queue fails to reach the advertised throughput/IOPS numbers from io-properties (see [seastar#2771](scylladb#2771) for details). The reason for this is easy to understand when looking at a disk like the ones from the m7gd AWS series. They (approximately) achieve full throughput as well as full IOPS at 4k IOP size. This means that for cost modelling for a single IO operation it would be enough to just look at one dimension, either throughput or IOPS but not both. The current cost function which is always additive in IOP and throughput cost however costs operation way too high. This leads to the aforementioned problem where the io-queue underperforms. Taking 4k IOPS on m7gd as an example we only get around 60% of the disk performance. From various testing the max cost function seems to model disks better than the sum cost function and avoids the problem mentioned above. Note this doesn't change the "outer" cost model of how the costs of individual operations are summed together. That also isn't modelled entirely right for more modern disks but is pessimistically fine and not the focus of this patch. The problem can be easily shown using the following io-tester config: ``` - name: highprio shards: [0] type: randwrite shard_info: parallelism: 32 reqsize: 4kB|16kB|128kB shares: 1000 think_time: 0 - name: lowprio shards: [0] type: randwrite shard_info: parallelism: 32 reqsize: 4kB|16kB|128kB shares: 100 think_time: 0 ``` On m7gd.4xl (all numbers in MB/s): - NOIP: Not using io-properties - IP: Using io-properties - MCF: Using io-properties with the max cost function | | 4k | | | 16k | | | 128k | | | | | high prio | low prio | total | high prio | low prio | total | high prio | low prio | total | | NOIP | 303 | 287 | 591 | 299 | 292 | 591 | 295 | 295 | 590 | | IP | 262 | 28 | 290 | 432 | 44 | 476 | 528 | 53 | 581 | | MCF | 476 | 56 | 532 | 543 | 55 | 598 | 544 | 55 | 599 | We see that at low IOP size using no io-properties we do get the full throughput but no scheduling (bad), using io-properties we do get scheduling but low throughput and with the max cost function we get both. More data and extensive discussion can be found here: - scylladb#2801 (comment) - scylladb#2840 (comment) - https://docs.google.com/spreadsheets/d/1u_9tRxePkq-OYFiba8xz4JY54P2VJ-JQBvhuBv04tmo These tests also include latency tests in priority scenarios and different kind of disks like EBS. The linked PR also includes a discussion about a different approach using a weighted cost function. Note that in general latency is often a function of throughput as the io-queue arbitrarily keeping things in the queue induces latency. This is especially true for real world usecases like RP where write amplification is generally fairly high but at the same time can be absorbed at the cost of latency. (cherry picked from commit 9bf2094)
58ce918 to
68c3a6f
Compare
Author
|
Changes in force-push:
|
pgellert
approved these changes
Aug 11, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Notes for reviewer: